Your search returned 63 publications. Publications with a checkbox can be ordered directly from Mathematica. Simply select those that you would like and they will be added to your shopping cart. Once you have selected all of the publications you are interested in, choose the View Order button below to confirm quantities and to place your order.
Selected publications are also available in HTML or Adobe's Acrobat Reader PDF format.
This paper examines both theoretically and empirically whether using ordinary least squares (OLS) multivariate regression models to estimate average treatment effects under experimental designs is justified by the Neyman model for causal inference. The paper finds that estimated standard errors and significance levels for average treatment effect estimators are similar under the OLS and Neyman models when baseline covariates are included in the models, even though theory suggests that this may not have been the case.
In social policy evaluations, the multiple testing problem occurs due to the many hypothesis tests typically conducted across multiple outcomes and subgroups, which can lead to spurious impact findings. This article discusses a framework that balances Types I and II errors for addressing this problem.
For randomized control trials (RCTs) of education interventions, it is often of interest to estimate associations between student and mediating teacher practice outcomes, to examine the extent to which a study’s conceptual model is supported by data, and to identify mediators most associated with student learning. This paper develops statistical power formulas for such exploratory analyses under clustered school-based RCTs using ordinary least squares (OLS) and instrumental variable (IV) estimators, and uses these formulas to conduct a simulated power analysis. The power analysis finds that for currently available mediators, the OLS approach will yield precise estimates of associations between teacher practice measures and student test score gains only if the sample contains about 150 to 200 study schools. The IV approach, which can adjust for potential omitted variable and simultaneity biases, has very little statistical power for mediator analyses. For typical RCT evaluations, these results may have design implications for the scope of the data collection effort for obtaining costly teacher practice mediators.
This paper examines the estimation of two-stage clustered randomized controlled trial designs (RCTs) in education research using the Neyman causal inference framework that underlies experiments. The key distinction between the considered causal models is whether potential treatment and control group outcomes are considered to be fixed for the study population (the finite-population model) or randomly selected from a vaguely defined universe (the super-population model). Appropriate estimators are derived and discussed for each model. Using data from five large-scale clustered RCTs in the education area, the empirical analysis estimates impacts and their standard errors using the considered estimators. For all studies, the estimators yield identical findings concerning statistical significance. However, standard errors sometimes differ, suggesting that policy conclusions from RCTs could be sensitive to the choice of estimator. A key recommendation is that analysts test the sensitivity of their impact findings using different estimation methods and cluster-level weighting schemes.
Randomized control trials (RCTs) in the education field typically examine the intention-to-treat parameter, which is estimated by comparing the mean outcomes of treatment group members to those of the control group. This report examines the identification and estimation of the complier average causal effect (CACE) parameter—the average impact of intervention services on those who comply with their treatment assignments—under clustered RCT designs that are typically used in the education field. The authors used data from ten large-scale RCTs to compare significance findings for the CACE estimator measured in nominal and standard deviation units using the correct variance formulas with those that are typically used in practice. The empirical results indicate that the variance correction terms matter little.
This paper presents findings from an experimental evaluation of Job Corps, the nation’s largest training program for disadvantaged youths. The study used survey data collected over four years, as well as tax data collected over nine years, for a nationwide sample of 15,400 treatments and controls. The Job Corps model has promise; program participation increases educational attainment, reduces criminal activity, and increases earnings for several postprogram years. Based on tax data, however, the earnings gains were not sustained except for the oldest participants. Nonetheless, Job Corps is the only federal training program that has been shown to increase earnings for this population.
Pretest and post-test experimental designs are often used in randomized control trials (RCTs) in the education field to improve the precision of the estimated treatment effects. For logistic reasons, however, pretest data are often collected after random assignment, so that including them in the analysis could bias the post-test impact estimates. Deciding whether to collect and use late pretest data in RCTs involves a variance-bias tradeoff. This paper addresses this issue both theoretically and empirically for several commonly used impact estimators, using a loss function approach grounded in the causal inference literature. For RCTs of interventions to improve student test scores, estimators that include late pretests will typically be preferred to estimators that exclude them or that instead include uncontaminated baseline test score data from other sources. This result holds as long as test score impacts do not grow very quickly early in the school year.
This paper examines theoretical and empirical issues related to the statistical power of impact estimates under clustered regression discontinuity (RD) designs. The theory is grounded in the causal inference and HLM modeling literature, and the empirical work focuses on commonly used designs in education research to test intervention effects on student test scores. The main conclusion is that three to four times larger samples are typically required under RD than experimental clustered designs to produce impacts with the same level of statistical precision. The viability of using RD designs for new impact evaluations of educational interventions may be limited and will depend on the point of treatment assignment, the availability of pretests, and key research questions.
This article examines the association between program performance measures and long-term program impacts, using nine-year follow-up data from a recent large-scale national experimental evaluation of Job Corps, the nation’s largest federal job training program for disadvantaged youths. The authors note that impacts on key outcomes are not associated with center performance levels. Participants in higher-performing centers had better outcomes; however, the same pattern held for controls. The program’s performance measurement system is not achieving the goal of ranking and rewarding centers on the basis of their ability to improve participant outcomes relative to what these outcomes would have been otherwise.
Statistical procedures that correct for multiple testing typically result in hypothesis tests with reduced statistical power because adjustment methods reduce the likelihood of identifying real differences between contrasted groups. There is disagreement among researchers about the use of multiple testing procedures and the appropriate trade-off between type I error and statistical power (type II error). These guidelines were developed to handle multiple testing in education research. In addition, the report provides details on the nature of the multiple testing problem and the statistical solutions that have been proposed; the creation of composite outcomes measures; and the Bayesian hypothesis testing approach.
This article examines theoretical and empirical issues related to the statistical power of impact estimates for experimental evaluations of education programs. The author considers designs where random assignment is conducted at the school, classroom, or student level, and employs a unified analytic framework using statistical methods from the literature. Focusing on standardized test scores of elementary school students, this article discusses appropriate precision standards and, for each design, the required number of schools to achieve those standards using empirical values of intraclass correlations, regression R2 values, and other parameters. Clustering effects vary by design but are typically large. As a result, large school samples are required for education trials, and many evaluations will have sufficient power to detect precise impacts only for relatively large subgroups of sites.
Paper presented at the fall 2007 Association for Public Policy Analysis and Management conference held in Washington, DC.
The Early Reading First program provided grants that were designed to enhance teacher practices, instructional content, and classroom environments in preschools to ensure that young children, especially those from low-income families, start school with the skills needed for academic success. This report to Congress presents program impacts on children’s language and literacy skills and on the instructional content and practices in preschool classrooms. The report notes that the program had positive, statistically significant impacts on several classroom and teacher outcomes and on children’s print and letter knowledge.
This article discusses the use of propensity scoring in experimental program evaluations to estimate impacts for subgroups defined by program features and participants' program experiences. The authors discuss estimation issues, provide specification tests, and review an overlooked data collection design—obtaining predictions that program intake staff make about applicants' likely assignments and experiences—that could improve the quality of matched comparison samples. They demonstrate the approach's effectiveness in producing credible subgroup findings using data from Mathematica's Job Corps evaluation.
Presentation for the University of Michigan National Poverty Center conference in Washington, DC.
Notes that the Job Corps evaluators adhered to the Belmont principles in accordance with the evaluation goals mandated by Congress and the realities of using random assignment to evaluate an ongoing program.
Early Head Start, a federal program begun in 1995 for low-income pregnant women and families with infants and toddlers, was evaluated through a randomized trial of 3,001 families in 17 programs. Interviews with primary caregivers, child assessments, and observations of parent-child interactions were completed when children were three years old. Caregivers were diverse in race-ethnicity, language, and other characteristics. The program children performed better than control children in cognitive and language development, displayed higher emotional engagement of the parent and sustained attention with play objects, and were lower in aggressive behavior. Compared with controls, EHS parents were more emotionally supportive, provided more language and learning stimulation, read to their children more, and spanked less. The strongest and most numerous impacts were for programs that offered a mix of home-visiting and center-based services and fully implemented the performance standards early.
Every year, thousands of new teachers pass through hundreds of different teacher preparation programs and are hired to teach in the nation’s schools. In recent years, alternative programs have expanded rapidly. Despite the expansion of these new routes into teaching, little research exists to provide guidance on the effectiveness of different teacher training strategies. This report presents the design for Mathematica’s evaluation of these issues.
Head Start, the largest federally funded preschool program, provides comprehensive services to economically disadvantaged children and their families so that children can enter kindergarten ready to succeed in school. Performance standards include requirements for the intensity and quality of a broad range of services for children and families. This report discusses design options for potential future evaluations of Head Start quality enhancements. It describes the goals and activities associated with each of the three stages of research through the use of specific examples of potential Head Start Quality enhancements.
This paper examines theoretical and empirical issues related to the statistical power of impact estimates for experimental evaluations of education programs. It considers designs in which random assignment is conducted at the school, classroom, or student level, using a unified analytic framework based on statistical methods from the literature. Focusing on standardized test scores of elementary school students, the author discusses appropriate precision standards, and, for each design, the required number of schools to achieve these standards. Clustering effects vary by design but are typically large; consequently, large school samples are required, suggesting that most impact studies will only be able to address broad questions rigorously.
Examines the post-PRWORA labor market experiences of low-wage workers, using the Survey of Income and Program Participation (SIPP).
You can search on as many or as few of the above search fields as desired. When you search on more than one field, publications that meet all of the criteria you have specified will be displayed. For Title and/or Author, any search term that contains blank spaces must be enclosed in quotes. You can also connect search terms with the Boolean expressions AND, OR, and NOT. (Click here for an explanation of Boolean searches.) Do not use commas in your search terms. Click here to view a complete list of publications. (Note: This may take several minutes to load depending on connection speed.)
For additional help in locating or ordering publications, please call Jackie Allen, 609-275-2350, in our Princeton Office.